Integration of Continuous Speech Recognition and Information Retrieval for Mutually Optimal Performance

نویسندگان

  • Tsuhan Chen
  • Alex Hauptmann
  • Michael Witbrock
  • Matthew A. Siegler
چکیده

Traditionally, indexing and searching of speech content in multimedia databases have been achieved through a combination of separately constructed speech recognition and information retrieval engines. Although each technology has a legacy of research, only recently have efforts been made to study the potential suboptimality of this strategy, and none of these efforts specifically addresses the presence of uncertainty in automatically generated transcriptions. This research develops a refinement of the most common information retrieval relevance formula, TFIDF, to incorporate uncertainty as a retrieval feature, along with a set of techniques to acquire this uncertainty from multiple hypotheses produced by existing speech recognition data structures. In the process a greater amount of evidence is extracted than is available in the most likely transcription hypothesis, and overall retrieval precision and recall are improved. The term weighting scheme known as the inverse document frequency is shown to be a special case of the mutual information between the document set and the term, the former requiring a Boolean characterization of term occurrence information and the latter permitting fractional probabilities. The relevance between a query and document from speech recognition is then modelled as a random variable arising from the statistical nature of the speech recognition system. The statistics of this model are then derived from the word lattices and the N-Best lists from the output of the recognizer. In analyzing the word lattices, the path probabilities for each node are summed. The relative rankings of competing terms of these summed probabilities are shown to be indicative of the probability of term occurrence. A model of this relationship is used to predict term presence and term count, reducing the degradation in retrieval quality due to speech recognition by 24%. In a separate model, the Top-N distinct text-processed hypotheses from the word lattices are used to estimate the term probability and term count. This strategy reduces the degradation in retrieval quality due to speech recognition by 63%. Experiments were performed on a standardized test of broadcast news stories that had been transcribed manually and judged against a set of natural language queries.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

بهبود عملکرد سیستم بازشناسی گفتار پیوسته بوسیله ویژگی‌های استخراج شده از مانیفولدهای گفتاری در فضای بازسازی شده فاز

The design for new feature extraction methods out of the speech signal and combination of their obtained information is one of the most effective approaches to improve the performance of automatic speech recognition (ASR) system. Recent researches have been shown that the speech signal contains nonlinear and chaotic properties, but the effects of these properties are not used in the continuous ...

متن کامل

An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition

Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999